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ABSTRACT 


An  inequality  is  proved  and  its  interpretation  is  given.  Using 
the  inequality,  it  is  shown,  under  some  mild  conditions,  that  for  the 
univariate  truncated  distributions,  the  variance  of  the  truncated  dis 
tribution  increases  with  the  value  of  the  truncation  point. 
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1.  INTRODUCTION 


The  properties  of  the  truncated  distributions  for  the  various 
families  of  probability  densities  have  been  well  discussed  in  the 
literature.  Also,  well  known  are  the  expressions  for  mean,  variance 
and  higher  order  moments  of  truncated  distributions,  corresponding  to 

r  "  i 

certain  families.  Johnson  and  Kotz  [1]  present  an  excellent  account 
of  these  properties  almost  in  every  chapter  of  their  four-volume 
reference  work  on  statistical  distributions.  In  this  report,  we  firs^ 
derive.a  probability  inequality,  and  then  using  this  inequality,  obtain 
a  property  of  the  variance  of  the  subpopulation,  obtained  by  truncating 

the  superpopulation  between  two  points  for  a  certain  family  of  density 

r'i  ,  17^ 

function  bearing  some  mild  conditions.  ‘  ‘ —  /  #-/  1  -  1‘ 
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2 .  AN  INEQUALITY 


We  start  with  the  notations •  Let  X  be  a  random  variable  with  the 
probability  density  function  f(.)  >  0  and  let  F(.)  be  the  cumulative 
distribution  function  of  X.  We  further  assume  that  X  admits  the  first 
and  second  moments  m  and  v  respectively. 

Let  0  £  a  <  b  be  any  two  points.  The  probability  density  of  X  in 
the  truncated  region  a  x  b  would  be  given  by 


g(x)  =  -  ;  0  <_  a  <_  x  <_  b 

F(b)  -  F(a) 


(2.1a) 


and  therefore,  the  mean  and  variance  are  readily  seen  to  be 


x  g(x)dx 


2  2 
v  =  x  g(x)dx  -  m 


(2.1b) 


(2.1c) 


Before  we  prove  the  main  inequality,  we  will  state  and  prove  the 
following  lemma: 

Lemma  1.  Let  f(x)  >  0  be  a  continuous  integrable  density  function. 
Also,  let  f(x)  be  monotinically  decreasing  function  of  x  for  x  >_  0. 


Then, 


•c 

y  f(y+c+a)  <_  0  for  all  c  >_  0,  a  >_  0.  (2.2) 


Proof .  Consider, 


y  f(y+c+a)dy 


y  f(y+c+a)dy  +  y  f(y+c+a)dy 
J0 


y  f(-y+c+a)dy  +  y  f(y+c+a)dy 
0  J0 
rc 

yff(y+c+a)  -  f(-y+c+a)}dy 


£  0,  as  f(y+c+a)  <_  f(-y+c+a)  ¥  a  0,  and  V  0  <_  y  <_  c , 


We  are  now  in  a  position  where  we  can  prove  our  inequality  which 
we  state  in  the  following  lemma: 

Lemma  2 .  Let  0  <_  a  <  b,  such  that  F(b)  -  F(a)  =  a  is  fixed,  and  let 
f(.)  be  as  defined  as  in  Lemma  1.  Then 


a+b  _1 
2  -  a 


(2.3) 


Proof .  We  define  y  = 


x - — .  Then,  the  right  hand  side  can  be 


written  as 


r(  — ) 

1  2  ,  a+b  s  r,  ,  a+b  ^  , 

-  (y  +  — )  f(y  +  —  )dy 

^  2  ' 


r(-— ) 

2  r  t  .  a+b  \  ,  .  a+b 

y  f(y  +—  )dy 

-(if) 


r(if ) 

2  f(y  +  ^ 

-(if) 


The  first  integral  in  the  above  expression  is  nonpositive  using 
b-a 

Lemma  1  with  c  =  — ,  while  the  second  integral  is  easily  seen  to  be 
equal  to  a  (by  writing  it  again  in  terms  of  original  variable  x.) 
Hence,  (2.3)  is  established. 

Remarks .  1.  We  will  first  interpret  the  inequality  (2.3).  We  notice 

that  the  right  hand  side  of  (2.3)  is  mean  of  the  truncated  random 
variable  X,  0_<a_<x_<b.  (See  (2.1b)).  Hence  the  inequality  states 
that  the  mean  of  the  truncated  distribution  is  never  more  than  average 
of  the  truncation  points,  under  the  assumptions  already  stated. 

2.  In  case  X  was  originally  distributed  as  standard  normal, 
then  (2.3)  reduces  to  another  interesting  inequality 


a+b  >  (a)  -  4;  (b)  # 
2  -  *(b)  -  <Ka)  ’ 


0  <  a  <  b 


(2.4) 


where  t(.)  and  $(.)  are  respectively  ordinate  and  c.d.f.  of  standard 


normal  distribution. 


(2.4)  has  another  interesting  interpretation:  if  we  consider  <J>  as 


a  function  of  0,  then  by  mean  value  theorem,  there  exists  a  y ; 
<f(a)  <  y  <  ‘Kb)  such  that 


♦  (a)  ~  Kb)  _  _ii 

Kb)  -  <Ka)  3<t 


K.)  =  Y 


$"1(Y) 


d,  say. 


(2.4)  states  that  such  a  d,  corresponding  to  y  of  mean  value 
theorem,  will  always  be  less  than  or  equal  to  midpoint  of  a  and  b. 

A  different  proof  of  (2.4)  has  been  suggested  by 
Dr.  Nitish  Mukhopadhyaya  of  Oklahoma  State  University  in  a  personal 
communicat ion . 

3.  In  case  f(.)  was  monotinically  increasing,  the  direction 
of  inequalities  in  (2.2),  (2.3)  and  (2.4)  will  be  reversed.  Similar 
proof  will  go  through  with  trivial  changes. 
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3.  THE  VARIANCE  OF  THE  TRUNCATED  DISTRIBUTIONS 


Our  next  result  is  about  the  effect  of  different  truncations, 
but  of  the  same  proportion,  on  the  variances  of  the  subpopulation 
obtained  after  truncation.  The  result  shows  that  if  a  fixed  proportion 
of  the  original  population  is  truncated  by  points  a  and  b,  0  <_  a  <  b, 
such  that  F(b)  -  F(a)  =  a,  a  constant,  then  the  truncated  subpopulation 
becomes  more  and  more  diverse  as  we  move  away  from  the  origin,  under 
some  mild  conditions.  We  formally  state  this  result  in  the  following 
theorem: 


Theorem.  Let  X,  f(.)  F(.),  a,  b  and  a be  as  in  Lemma  2,  then  v,  the 
variances  of  X  in  the  truncated  population,  as  a  function  of  a  (and 
hence  of  b  as  well)  is  a  monotonically  increasing  function  for  a  >_  0. 
Proof.  To  prove  the  theorem,  it  would  be  enough  to  show  that  the 
derivative  of  the  variance  of  the  truncated  population  with  respect  to 
a  is  nonnegative. 

Note  that 

b 

f(x)dx  =  a  (3.1) 

a 

which  implies  that 


9b  _  f (a) 
3a  f (b) 


(3.2) 


now  using  (2.1b)  and  (2.1c),  the  variance  as  a  function  of  a  is 


v  (a)  =  —  f  x^  f(x)dx  -  (  — 
x  a  a 


x  f(x)dx)' 


(3.3) 


therefore,  using  (3.2),  we  have 
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3v  (a) 
x 

3a 


f(x)dx)  (b2f  (b)  1^-  -  a2f(a) 

\  (f  x  f  (x)dx)  (bf  (b)  -|j|y  -  af  (a) ) 


=  -  f (a) (b-a)  {(a+b)  -  - 
a  a 


x  f (x)dx) . 


Note  as  b  >  a;  quantity  outside  parentheses  is  positive,  while  that 
within  parentheses  is,  using  Lemma  2,  nonnegative.  Hence, 

3v  (a) 

— -  >  o, 

3a 

which  proves  our  theorem. 

Remarks .  1.  Theorem  can  easily  be  stated  for  monotonically  increasing 

f(.)  with  trivial  changes. 

2.  As  a  corollary,  it  can  be  seen  that  for  any  probability 
density  symmetric  about  zero;  variance  of  any  a-truncation  is  an 
increasing  function  of  |a|. 


7 


4.  SOME  APPLICATIONS 

Usually  in  the  problem  of  genetic  selection,  selection  is  made  to 

maximize  the  average  of  the  unobserved  or  unobservable  criterian 

variable,  but  it  is  made  on  the  basis  of  observed  values  of  predictors. 

If  we  denote  the  criterian  variable  by  y  and  the  regression  of 

criterian  on  all  the  predictors  by  n,  then  it  is  well  known  that  the 

best  strategy  is  to  select  all  those  for  which 

n  _>  k  (4.1) 

where  k  is  chosen  in  such  a  way  that  proportion  of  the  selected 

population  is  a,  a  predecided  value  between  0  and  1. 

If  we  assume  that  all  the  predictors  and  criterian  are  in  the 

original  population,  distributed  jointly  as  multivariate  normal  with 

zero  mean,  then  n  will  also  be  normally  distributed  with  zero  mean. 

2  2 

Writing  a ^  and  for  variances  of  y  and  n  respectively  in  the 
original  population,  and  W  for  a  truncated  region  on  n  axis,  we  have 

V(y|n  e  W)  =  V(E(y[n)|n  e  W)  +  E(V(y|n)Jn  £  w) 
or 

V  ( y  1  ri  e  W)  =  V(nln  eW)+a^-o^  .  (4.2) 

y  n 

(4.2)  shows  that  V(y|n  z  W)  and  V(n|n  e  W)  differ  only  by  a 
constant  for  any  region  W  on  n-axis.  Now  if  our  policy  for  selection 
was  as  in  (4.1),  it  would  lead  to  a  a-proportion  subpopulation,  even 
though  it  maximizes  the  mean  of  criterian  variable,  it  is  also  the  most 
diverse  for  it.  If  too  much  variability  is  to  be  avoided  and  if  one 
seeks  a  region  W,  for  which  V(nln  z  W)  £  e,  a  prespecified  quantity, 
then  the  region  W,  maximizing  mean  subject  to  the  above  constraint, 
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would  be: 

W*:  k  _<  n  £  k2  (4 . 3a) 

so  that 

V  _(n)  =  e  (4.3b) 

and  that 

P(k^  £  ri  <_  k^)  =  ct.  (4.3c) 

Of  course,  to  control  the  variability,  one  has  to  sacrifice  some 
of  the  individual  units  with  high  values  of  criterian  variable. 

There  may  be  a  situation  where,  for  further  experiments,  the  whole 
population  is  to  be  divided  into  several  groups  equal  in  size  on  the 

basis  of  means  of  the  criterian  variable.  The  theorem  says  that  these 
groups  will  differ  not  only  in  their  mean  values  but  also  in  the 
amount  of  variability  and  one  should  possibly  take  this  fact  into 
account  while  planning  for  further  experiments. 
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